Executing linear algebra kernels in heterogeneous distributed infrastructures with PyCOMPSs

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Code Generation of Optimized Distributed-Memory Dense Linear Algebra Kernels

Design by Transformation (DxT) is an approach to software development that encodes domain-specific programs as graphs and expert design knowledge as graph transformations. The goal of DxT is to mechanize the generation of highly optimized code. This paper demonstrates how DxT can be used to transform sequential specifications of an important set of Dense Linear Algebra (DLA) kernels, the level-...

متن کامل

Accelerating GPU Kernels for Dense Linear Algebra

Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major building block of dense linear algebra (DLA) libraries, and therefore have to be highly optimized. We present some techniques and implementations that significantly accelerate the corresponding routines from currently available libraries for GPUs. In particular, Pointer Redirecting – a set of GPU specific optimiz...

متن کامل

Load Balancing Strategies for Dense Linear Algebra Kernels on Heterogeneous Two-Dimensional Grids

متن کامل

Data Allocation Strategies for Dense Linear Algebra Kernels on Heterogeneous Two-dimensional Grids

We study the implementation of dense linear algebra computations, such as matrix multiplication and linear system solvers, on two-dimensional (2D) grids of heterogeneous processors. For these operations, 2D-grids are the key to scalability and eÆciency. The uniform block-cyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these opera...

متن کامل

Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach

While the first two involve fundamental physical limitations that current technology trends are unlikely to overcome in the near term, the third is an obvious consequence of the first two, combined with the economic necessity of using many thousands of computational units to scale up to petascale and larger systems. More transistors and slower clocks require multicore designs and an increased p...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Oil & Gas Science and Technology – Revue d’IFP Energies nouvelles

سال: 2018

ISSN: 1294-4475,1953-8189

DOI: 10.2516/ogst/2018047